PDBProp - Working With a Single PDB Structure¶
This notebook gives a tutorial of the PDBProp object, specifically how chains are handled and how to map a sequence to it.
Imports¶
In [ ]:
from ssbio.databases.pdb import PDBProp
from ssbio.databases.uniprot import UniProtProp
In [ ]:
import sys
import logging
In [ ]:
# Create logger
logger = logging.getLogger()
logger.setLevel(logging.DEBUG) # SET YOUR LOGGING LEVEL HERE #
In [ ]:
# Other logger stuff for Jupyter notebooks
handler = logging.StreamHandler(sys.stderr)
formatter = logging.Formatter('[%(asctime)s] [%(name)s] %(levelname)s: %(message)s', datefmt="%Y-%m-%d %H:%M")
handler.setFormatter(formatter)
logger.handlers = [handler]
Basic methods¶
In [ ]:
my_structure = PDBProp(ident='5T4Q', description='E. coli ATP synthase')
Download the structure¶
Downloading will: - Download the file type of choice to the specific output directory - Parse the PDB header file to fill out the metadata fields
In [ ]:
import tempfile
my_structure.download_structure_file(outdir=tempfile.gettempdir(), file_type='mmtf')
View all attributes¶
In [ ]:
my_structure.get_dict()
Set chains that we are interested in (if any)¶
The mapped_chains
attribute allows us to limit sequence analyses to
specified chains (see the later section where we align a sequence to
this structure). For this
example, the ATP synthase is a complex of a number of protein chains,
and if we are interested in a specific gene transcript, we can set
those.
In [ ]:
# Chains A, B, and C make up ATP synthase subunit alpha - from the gene b3734 (UniProt ID P0ABB0)
my_structure.add_mapped_chain_ids(['A', 'B', 'C'])
Parse the structure to work with the Biopython Structure object¶
Parsing the structure will parse the sequences of each chain, and store
those in the chains
attribute. It will also return a Biopython
Structure object which opens up all methods available for structures in
Biopython.
In [ ]:
parsed_structure = my_structure.parse_structure()
print(type(parsed_structure.structure))
print(type(parsed_structure.first_model))
Clean the structure and save the structure¶
Cleaning a structure does the following: - Add missing chain identifiers to a PDB file - Select a single chain if noted - Remove alternate atom locations - Add atom occupancies - Add B (temperature) factors (default Biopython behavior)
In the example below, we will clean the structure so it only includes our mapped chains.
In [ ]:
cleaned_structure = my_structure.clean_structure(outdir='/tmp', keep_chains=my_structure.mapped_chains, force_rerun=True)
cleaned_structure
Viewing the structure¶
In [ ]:
# The original structure
my_structure.view_structure(recolor=False)
In [ ]:
# The cleaned structure
import nglview
nglview.show_structure_file(cleaned_structure)